Introduction

The plot I decided to improve, was found under this link https://www.visualcapitalist.com/the-worlds-biggest-startups-top-unicorns-of-2021/. I see that there are a whole lot of information that could be represented there, but were not (for example, author could have analyzed the industries as a whole, not only single start-ups)

My Solution

Firstly, I have found the data frame on Kaggle, under this link: https://www.kaggle.com/datasets/ramjasmaurya/unicorn-startups

unicorns_data <- read.csv("~/Downloads/unicorns till sep 2022.csv")

library(plotly)
library(dplyr)

Then, I prepared the data:

unicorns_data$Valuation...B. <- gsub("\\$", "", unicorns_data$Valuation...B.)
unicorns_data$Valuation...B. <- gsub(",", "", unicorns_data$Valuation...B.)
unicorns_data$Valuation...B. <- as.numeric(unicorns_data$Valuation...B.)


high_value_startups <- subset(unicorns_data, Valuation...B. > 10)

industry_valuation <- high_value_startups %>%
  group_by(Industry) %>%
  summarise(TotalValuation = sum(Valuation...B.)) %>%
  arrange(desc(TotalValuation))


high_value_startups$Industry <- factor(high_value_startups$Industry,
                                       levels = industry_valuation$Industry)

country_valuation <- high_value_startups %>%
  group_by(Country) %>%
  summarise(TotalValuation = sum(Valuation...B.)) %>%
  arrange(TotalValuation)


high_value_startups$Country <- factor(high_value_startups$Country,
                                       levels = country_valuation$Country)



high_value_startups$Jittered_Industry <- jitter(as.numeric(as.factor(high_value_startups$Industry)))
high_value_startups$Jittered_Country <- jitter(as.numeric(as.factor(high_value_startups$Country)))

Here, I sorted the industries and countries by total valuation, so we could have more accurate representation of global VC-economy.

fig <- plot_ly(high_value_startups,
               x = ~Jittered_Industry,
               y = ~Jittered_Country,
               size = ~Valuation...B.,
               color = ~Industry,
               text = ~paste(Company, ":", Valuation...B., "B$",", ",Country,", ",City.,"\nInvestors: ",Investors),
               hoverinfo = "text",
               type = 'scatter',
               mode = 'markers',
               showlegend=FALSE)


fig <- fig %>% layout(title = "Valuation of Startups by Industry and Country",
                      xaxis = list(title = "Industry", tickmode = "array", tickvals = jitter(as.numeric(as.factor(unique(high_value_startups$Industry)))), ticktext = unique(high_value_startups$Industry)),
                      yaxis = list(title = "Country", tickmode = "array", tickvals = jitter(as.numeric(as.factor(unique(high_value_startups$Country)))), ticktext = unique(high_value_startups$Country)))


fig

Final plot. Here, in my opinion I made it more accurate. Because original graph showed us China’s e-commerce companies at the top of the list, and suggested domination of China’s VC field. But in reality we can see that investors in 2022 preferred fintech as main startups’ industry to invest, also we can see the list of VC funds that invested in each startup, and we can make a conclusion, that still US and European global VC funds tend to invest in US or Europe.

This conclusions could not have been made from the original plot.